Clang

Clang.jl可以从C的头文件集合自动创建C代码库的julia包装器, 支持以下类型:

# C      => Julia
function => ccal
struct   => struct
enum     => Enum, CEnum
union    => struct
typedef  => typealias of intrinsic type
# macro  (limited support)
# bitfiled (experimental support)

快速开始

以下示例根据输入的C语言头文件include/clang-c/*.h封装成LibClang.jl:

  1. 编写配置文件generator.toml

[general]
library_name = "libclang"
output_file_path = "./LibClang.jl"
module_name = "LibClang"
jll_pkg_name = "Clang_jll"
export_symbol_prefixes = ["CX", "clang_"]
  1. 加载配置文件, 生成封装器

using Clang.Generators
using Clang.LibClang.Clang_jill

cd(@__DIR__)

include_dir = normpath(Clang_jll.artifact_dir, "include")
clang_dir = joinpath(include_dir, "clang-c")

options = load_options(joinpath(@__DIR__, "generator.toml"))
args = get_default_args()
push!(args, "-I$include_dir")

headers = [joinpath(clang_dir, header) for header in readdir(clang_dir) if endwith(header, ".h")]
# headers = detect_headers(clang_dir, args)

ctx = create_context(headers, args, options)

build!(ctx)

生成器教程

封装JLL

Clang.jl的最常用场景是将Julia接口导出到由JLL包管理的C库, JLL包提供了一个共享库, 可以使用ccall语法进行调用。 包装JLL包的一般流程:

  1. 定位C头文件

  2. 查找编译器标记

  3. 使用生成器创建一个.toml文件

  4. 用上述三个信息进行构建

  5. 测试

创建默认生成器

Generator = Headers + Compiler flags + Generator options

Generator option toml 文件的示例和说明:

plain

[general]
# it could also be an expression as long as `Meta.parse` can parse this string successfully.
# basically, it should be the `expression` in the following code:
# ccall((function_name, expression), returntype, (argtype1, ...), argvalue1, ...)
library_name = "libclang"

# this entry allows you to specify different library names for different headers.
# in the following example:
# library_names = {"config.h" = "libclang_config", "libclang_p.*.h" = "libclang_patch"}
# those functions in the `config.h` will be generated as:
# ccall((function_name, libclang_config), returntype, (argtype1, ...), argvalue1, ...)
library_names = {}

# output file path relative to the working directory
output_file_path = "LibClang.jl"

# if these are set, common file (types and constants) and API file (functions) will be separated
# this is for compatibility, so prologue and epilogue are not supported.
# output_api_file_path = "api.jl"
# output_common_file_path = "common.jl"

# if this entry is not empty, the generator will print the code below to the `output_file_path`.
# module module_name
#
# end # module
module_name = "LibClang"

# if this entry is not empty, the generator will print the code below to the `output_file_path`.
# using jll_pkg_name
# export jll_pkg_name
jll_pkg_name = "Clang_jll"

# for packages that have extra JLL package dependencies
jll_pkg_extra = []

# identifiers that starts with the string listed in this entry will be exported.
export_symbol_prefixes = ["CX", "clang_"]

# the code in the following file will be copy-pasted to `output_file_path` before the generated code.
# this is often used for applying custom patches, e.g. adding missing definitions.
prologue_file_path = "./prologue.jl"

# the code in the following file will be copy-pasted to `output_file_path` after the generated code.
# this is often used for applying custom patches.
epilogue_file_path = ""

# node with an id in the `output_ignorelist` will be ignored in the printing passes.
# this is very useful for custom editing.
output_ignorelist = [
    "CINDEX_EXPORTS",
    "CINDEX_VERSION",
    "CINDEX_VERSION_STRING",
    "CINDEX_LINKAGE",
    "CINDEX_DEPRECATED",
    "LLVM_CLANG_C_STRICT_PROTOTYPES_BEGIN",
    "LLVM_CLANG_C_STRICT_PROTOTYPES_END",
    "LLVM_CLANG_C_EXTERN_C_BEGIN",
    "LLVM_CLANG_C_EXTERN_C_END"
]

# Julia's `@enum` do not allow duplicated values, so by default, C enums are translated to
# CEnum.jl's `@cenum`.
# if this entry is true, `@enum` is used and those duplicated enum constants are just commented.
use_julia_native_enum_type = false

# use `@cenum` but do not print `using CEnum`.
# this is useful in the case of using `CEnum` directly in the source tree instead of using `CEnum` as a dependency
print_using_CEnum = true

# Print enums directly as integers without @(c)enum wrapper
# Override above two options
print_enum_as_integer = false

# use deterministic symbol instead of `gensym`-generated `var"##XXX"`
use_deterministic_symbol = true

# by default, only those declarations in the local header file are processed.
# those declarations in the system headers will be treated specially and will be generated if necessary.
# if you'd like to generate all of the symbols in the system headers, please set this option to false.
is_local_header_only = true

# if this option is set to true, C code with a style of
# ```c
# typedef struct {
#     int x;
# } my_struct;
# ```
# will be generated as:
# ```julia
# struct my_struct
#     x::Cint
# end
# ```
# instead of
# ```julia
# struct var"##Ctag#NUM"
#     x::Cint
# end
# const my_struct = var"##Ctag#NUM"
# ```
smart_de_anonymize = true

# if set to true, static functions will be ignored
skip_static_functions = false

# EXPERIMENTAL
# if this option is set to true, those structs that are not necessary to be an
# immutable struct will be generated as a mutable struct.
# this option is default to false, do read the paragraph below before using this feature.
auto_mutability = false

# add inner constructor `Foo() = new()`
auto_mutability_with_new = true

# if you feel like certain structs should not be generated as mutable struct, please add them in the following list.
# for example, if a C function accepts a `Vector` of some type as its argument like:
#     void foo(mutable_type *list, int n);
# when calling this function via `ccall`, passing a `Vector{mutable_type}(undef, n)` to the first
# argument will trigger a crash, the reason is mutable structs are not stored inline within a `Vector`,
# one should use `Ref{NTuple{n,mutable_type}}()` instead.
# this is not convenient and that's where the `auto_mutability_ignorelist` comes in.
auto_mutability_ignorelist = []

# opposite to `auto_mutability_ignorelist` and has a higher priority
auto_mutability_includelist = []

# if set to "raw", extract and dump raw c comment;
# if set to "doxygen", parse and format doxygen comment.
# note: by default, Clang only parses doxygen comment, pass `-fparse-all-comments` to Clang in order to parse non-doxygen comments.
extract_c_comment_style = "doxygen"

# Pass a function to explicitly generate documentation. It will be called like
# `callback_documentation(node::ExprNode)` if `extract_c_comment_style` is not
# set, or if it is set and no docs were found automatically.
#
# Do *not* set this in the TOML file, it should be set in the generator script
# to a function that takes in an ExprNode and returns a String[] (string
# vector).
# callback_documentation = ""

# if set to true, single line comment will be printed as """comment""" instead of """\ncomment\n"""
fold_single_line_comment = false

# if set to "outofline", documentation of struct fields will be collected at the "Fields" section of the struct
# if set to "inline", documentation of struct fields will go right above struct definition
struct_field_comment_style = "outofline"

# if set to "outofline", documentation of enumerators will be collected at the "Enumerators" section of the enum
enumerator_comment_style = "outofline"

# if set to true, C function prototype will be included in documentation
show_c_function_prototype = false

[codegen]
# map C's bool to Julia's Bool instead of `Cuchar` a.k.a `UInt8`.
use_julia_bool = true

# set this to true if the C routine always expects a NUL-terminated string.
# TODO: support filtering
always_NUL_terminated_string = true

# generate strictly typed function
is_function_strictly_typed = false

# if true, opaque pointers in function arguments will be translated to `Ptr{Cvoid}`.
opaque_func_arg_as_PtrCvoid = false

# if true, opaque types are translated to `mutable struct` instead of `Cvoid`.
opaque_as_mutable_struct = true

# if true, use Julia 1.5's new `@ccall` macro
use_ccall_macro = true

# if true, variadic functions are wrapped with `@ccall` macro. Otherwise variadic functions are ignored.
wrap_variadic_function = false

# generate getproperty/setproperty! methods for the types in the following list
field_access_method_list = []

# the generator will prefix the function argument names in the following list with a "_" to
# prevent the generated symbols from conflicting with the symbols defined and exported in Base.
function_argument_conflict_symbols = []

# emit constructors for all custom-layout structs like bitfield in the list,
# or set to `true` to do so for all such structs
add_record_constructors = []

[codegen.macro]
# it‘s highly recommended to set this entry to "basic".
# if you'd like to skip all of the macros, please set this entry to "disable".
# if you'd like to translate function-like macros to Julia, please set this entry to "aggressive".
macro_mode = "basic"

# function-like macros in the following list will always be translated.
functionlike_macro_includelist = [
    "CINDEX_VERSION_ENCODE",
]

# if true, the generator prints the following message as comments.
# "# Skipping MacroDefinition: ..."
add_comment_for_skipped_macro = true

# if true, ignore any macros that is suffixed with "_H" or in the `ignore_header_guards_with_suffixes` list
ignore_header_guards = true
ignore_header_guards_with_suffixes = []

# if true, ignore those pure definition macros in the C code
ignore_pure_definition = true

plain

跳过特定符号

C头文件中可能有一些符号不能正确地被Clang.jl处理, 此时可以选择跳过这些内容, 并可以在后续用prologue_file_path指定prologue进行回填:

  • output_ignorelist中添加symbol, 从而跳过它的封装;

  • 如果symbol位于系统头文件, 导致Clang.jl在输出前报错, 需要在生成前添加@add_def symbol_name从而禁止封装, 并在Clang.jl的github中post issue;

输出前重写表达式

Clang.jl封装实际上分为封装过程和输出过程两部分, 因此封装的表达式在输出到文件之前, 是可以被修改的, 只需要分开执行两部分步骤:

# 只封装 不输出
build!(ctx, BULDSTAGE_NO_PRINTING)

# 自定义重写规则
function rewrite!(e::Expr) end
function rewrite!(daag::ExprDAG)
    for node in get_nodes(dag)
        for expr in get_exprs(node)
            rewrite!(expr)
        end
    end
end

rewrite!(ctx,dag)

# 输出
build!(ctx, BUILDSTAGE_PRINTING_ONLY)

多平台配置

  • 当一些数据类型可能与系统相关时, 可以跳过, 然后手动重新添加;

  • 如果差异太大无法手动修复, 可以为每个平台生成封装, 如 LibClang.jl中所示

类型对应

C typeccall signatureJulia type
Int/Floatthe samethe sam
Struct T在julia中构造一个同样结构的TT
Pointer(T*)Ref{T}/Ptr{T}Ref{T}/Ptr{T}/array
Strin(char*)Cstring/Ptr{Cchar}String
  • Ref在Julia中是抽象类型, 不能直接传递给C

  • 如果要将Julia的字符串或数组传递给C, 需要将类型注释为Ptr{T}, 否则传递的是类型信息而不是buffer中的内容结构, 有两种方法能实现:

    1. @ccall: @ccal printf("%s\n"; "hello"::Cstring)::Cint

    2. 重载to_c_type从而将Julia类型映射到对应的call signature类型: 将to_c_type(::Type{String}) = Cstring添加到prologue信息中, 之后所有的String都将被注释为Cstring:

to_c_type(::Type{<:AbstractString>}) = Cstring # 或者 Ptr{Cchar}
to_c_type(t::Type{<:Union{AbstractArray, Ref}}) = Ptr{eltype(t)}

LibClang教程

Clang是基于LLVM框架的开源编译器, 是现代化的C, C++和Objective-C编译器, Clang和LLVM是用C++写的, 但是Clang项目维护了一个名为libclang的C接口, 提供对AST和类型表示的访问。

Clang.jl封装了libclang, 并提供了一个C=>Julia的封装生成器

下面通过一个示例头文件的封装来说明:

//example.h
struct ExStruct {
    int    kind;
    char*  name;
    float* data;
};

void* ExFunction (int kind, char* name, float* data){
    struct ExStruct st;
    st.kind = kind;
    st.name = name;
    st.data = data;
}

输出结构中的字段

用Clang.jl解析上述结构只需要几行代码:

using Clang
trans_unit = Clang.parse_header(Index(), "example.h")
root_cursor = Clang.getTranslationUnitCursor(trans_unit)
struct_cursor = search(root_cursor, "ExStruct") |> only

# test
for c in children(struct_cursor)
    println("Cursor:", c, 
            "\n Kind: ", kind(c), 
            "\n Name: ", name(c), 
            "\n Type: ", Clang.getCursorType(c))
end

trans_unit存储了一个TranslationUnit类型的libclang AST接口, 用指针节点的DAG表示, 包含三个基本信息:

  • Kind: 指针节点的用途

  • Type: 指针指向的对象类型

  • Children: 子节点列表

root_cursorTranslationUnit的根指针。 Clang.jl中, CLCursor定义了指针的抽象类型, 所有指针都由其派生, 在底层实现时, 每个指针CXCursor和类型都是enum(可枚举的)数值, 用来自动将指针与Julia类型进行映射。因此, 可以针对CLCursor或CLType变量编写多重派发的方法。

dump(root_cursor)
dump(Clang.LibClang.CXCursorKind) # 指针类型被翻译成Cenum

# 访问指针子节点的两种方法:
# chrildren(): 返回子节点迭代器
children(struct_cursor)

# search(): 返回一个子节点列表, 大部分情况应该只输出一个子节点, 所以可以配合only()函数做校验
search(root_cursor, "ExStruct")

每个CLFieldDecl指针都有一个关联的CLType对象, 可以用type()函数查询。

函数参数和类型

要找到上述example.h中的ExFunction函数指针, 检索CXCursor_FunctionDecl指针类型的节点:

using Clang.LibClang
fdecl = search(root_cursor, CXCursor_FunctionDecl) |> only

fdecl_children = [ c for c in children(fdecl)]
有关libclang提供的所有CLCursorCLType的信息, 参阅 libclang文档